Residual diagnostic

Fitted Values :- 1. Each observation in a time series can be forecast using all previous observations. These values are called Fitted Values. 2. Fitted Values are often not the actual forecasts because any parameters involved in the forecasting method are estimated using all the available observation in the time series, including future observations. For e.g. Average Method and Drift Method. But in case of naive and seasonal naive, the forecast do not involve any parameter so the fitted values are actual forecasts.

Residual :- Residual in a time series model is what is left over after fitting a model. \[e_t = y_t - fitted(y_t)\] If a model fit the data accurately then,

  1. The residuals are uncorrelated. If there are correlations between residuals, then there is information left in the residuals which should be used in computing forecasts. 2.The residual has zero mean. If the residuals have a mean other than zero, then the forecasts are biased. However, adjusting for bias is easy: if the residuals have mean m, then simply add m to all forecasts and the bias problem is solved.

But there is a possobility that more than one forecasting methods satisfy above mentioned properties. So these properties can not be used to select a forecasting method.

Another 2 useful (not necessary) properties to check are 1. The residuals have a constant variance. 2. The residuals are normally distributed.

The above 2 properties makes the calculation for prediction interval more easy otherwise they do not cause any problem in the forecasting.

Now we will see an example of residual analysis.

library(fpp2)
## Loading required package: ggplot2
## Loading required package: forecast
## Loading required package: fma
## Loading required package: expsmooth
autoplot(goog200) +
  xlab("Day") + ylab("closing prices(in $") +
  ggtitle("Google Stock(daily ending 6 December 2013")

For stock prices, naive method is often the best method.

autoplot(residuals(naive(goog200))) + 
  xlab("Day") + ylab("") +
  ggtitle("Residual from the Naive Method")

The graph above shows that mean of the residuals is close to zero. Also, the variation of the residuals stays same across the historical data, apart from one outlier.

gghistogram(residuals(naive(goog200))) + ggtitle("Histogram of the residuals")
## Warning: Removed 1 rows containing non-finite values (stat_bin).

The histogram shows that the residuals are not normally distributed. There are some extreme observations on the right (positively skewed). The prediction interval calculated assuming the normal distribution may be inaccurate.

ggAcf(residuals(naive(goog200))) + ggtitle("ACF of residuals")

The autocorrelation is not significant for any lag.

From above graph it appears that naive method forecasts accounts for all available information.

Portmanteau tests for autocorrelation :- One issue with ACF plots is that we make an hypothesis for each of the lag and each test has some probability of giving a false result. If the value of h is large, then it is likely that atleast one of them gives wrong result. To overcome this we are going to test whether the first h autocorrelations are significantly different from what is expected from a white noise process.

Box - Pierce test :- \[ Q = T \sum_{k=1}^{h} r_{k}^{2}\ ,\] where h = maximum lag being considered.(usually h = 10 for non seasonal data, h = 2m for seasonal data with seasonal frequency m). The test is still not good if h is too large. So if the values are larger than T/5 use h = T/5.

T = no. of observations.

Ljung-Box test :- \[Q = T(T+2) \sum_{k=1}^{h}(T-k)^{-1} r_{k}^{2}\] If the autocorrelation comes with white noise then both Q follows chi square distribution with (h-K) degrees of freedom where K is the number of parameters in the model.

Box.test(residuals(naive(goog200)), lag = 10, fitdf = 0)
## 
##  Box-Pierce test
## 
## data:  residuals(naive(goog200))
## X-squared = 10.611, df = 10, p-value = 0.3886
Box.test(residuals(naive(goog200)), lag = 10, fitdf = 0, type = "Lj")
## 
##  Box-Ljung test
## 
## data:  residuals(naive(goog200))
## X-squared = 11.031, df = 10, p-value = 0.3551

p values are quite high that means that residuals are not distinguishable from a white noise process.

All of the above analysis can be done using just one function.

checkresiduals(naive(goog200))

## 
##  Ljung-Box test
## 
## data:  Residuals from Naive method
## Q* = 11.031, df = 10, p-value = 0.3551
## 
## Model df: 0.   Total lags used: 10

Forecast accuracy

The size of the residuals is not a reliable indication of how large true forecast errors are likely to be. The accuracy of forecasts can only be determined by considering how well a model performs on new data that were not used when fitting the model.

We divide the dataset in training data (80% usually) and test set (20% data). Forecast error = \(Actual(y_{T+h}) - forc(y_{T+h})\). Forecast error are different from residuals as the residuals are calculated on the training dataset.

Mean absolute error : MAE :- mean(|\(e_t\)|). Root squared mean error : RMSE :- \(\sqrt{mean(e_t^2)}\).

A method that minimize MAE leads to forecast of the median and a method that minimise RMSE leads to the forecast of mean.

The issue with MAE and RMSE is that the results are on same scale as of the data. So we cannot compare the accurancy of methods that are based on datasets of different units.

Mean absolute percentage error : MAPE :- mean(|\(p_t\)|). where \(p_t = 100e_t/y_t\).

Disadvantge of using percentage error :- 1. Percentages are infinite or undefined for \(y_t\) = 0 and having extreme values for \(y_t\) close to zero. 2. Percentage errors make sense only when the measurement is on interval scale not on the ratio scale because percentage errors assume that the measurement scale has a meaningful zero. 3. Percentage error ususally put high penalty on the negative error than on positive error as there is always more possibility of making a positive error than making a negetive error.

To solve these issues we might use symmetric MAPE(sMAPE) sMAPE :- mean( 200 * \(|y_t - \hat{y}_t|/(y_t + \hat{y}_t)\) ) This still do not solve the issue mentioned in (1) above.

Scaled Error :- To compare the forecast from different time series, we can use scaled error. For the non-seasonal data scaled error can be given as :- \[q_j = \frac{e_j}{\frac{1}{T-1} \sum_{t=2}^{T} |y_t - \hat{y}_t|}\] The denominator is measuring the error from a naive forecast method. 1. \(q_j\) is independent of units. 2. \(q_j\) is less than 1 if the forecast is better than the average naive forecast and greater than 1 is the forecast is worse than average naive forecast.

For the seasonal data scaled error can be given as :- \[q_j = \frac{e_j}{\frac{1}{T-m} \sum_{t=m+1}^{T} |y_t - \hat{y}_{t-m}|}\] Mean absolute scaled error (MASE) = mean(|\(q_t\)|).

We will see an example of this on Quarterly Beer Production data.

beer2 = window(ausbeer, start = 1992, end = c(2007,4))
beerfit1 = meanf(beer2, h = 10)
beerfit2 = rwf(beer2, h = 10)
beerfit3 = snaive(beer2, h = 10)
autoplot(window(ausbeer, start = 1992)) +
  autolayer(beerfit1, series = "Mean", PI = FALSE) +
  autolayer(beerfit2, series = "Naive", PI = FALSE) +
  autolayer(beerfit3, series = "Seasonal Naive", PI = FALSE) +
  ggtitle("Quarterly beer production data") +
  xlab("Years") + ylab("Megalitres") +
  guides(colour = guide_legend(title = "Forecast"))

The above graph shows that the seasonal naive fits the data more accurately. We will prove the same using error measures.

beer3 = window(ausbeer, start = 2008)
accuracy(beerfit1, beer3)
##                   ME     RMSE      MAE        MPE     MAPE     MASE
## Training set   0.000 43.62858 35.23438 -0.9365102 7.886776 2.463942
## Test set     -13.775 38.44724 34.82500 -3.9698659 8.283390 2.435315
##                     ACF1 Theil's U
## Training set -0.10915105        NA
## Test set     -0.06905715  0.801254
accuracy(beerfit2, beer3)
##                       ME     RMSE      MAE         MPE     MAPE     MASE
## Training set   0.4761905 65.31511 54.73016  -0.9162496 12.16415 3.827284
## Test set     -51.4000000 62.69290 57.40000 -12.9549160 14.18442 4.013986
##                     ACF1 Theil's U
## Training set -0.24098292        NA
## Test set     -0.06905715  1.254009
accuracy(beerfit3, beer3)
##                     ME     RMSE  MAE        MPE     MAPE      MASE
## Training set -2.133333 16.78193 14.3 -0.5537713 3.313685 1.0000000
## Test set      5.200000 14.31084 13.4  1.1475536 3.168503 0.9370629
##                    ACF1 Theil's U
## Training set -0.2876333        NA
## Test set      0.1318407  0.298728

The seasonal naive method shows the least test error in all the error measures. Some times different erroe measures might leads to different conclusion about which is the best forecasting method.

Now we will see a non seasonal data example. Google stock price index

googfc1 = meanf(goog200, h = 40)
googfc2 = rwf(goog200, h = 40)
googfc3 = rwf(goog200, drift = TRUE, h = 40)
autoplot(subset(goog, end = 240)) +
  autolayer(googfc1, series = "Mean", PI = FALSE) +
  autolayer(googfc2, series = "Naive", PI = FALSE) +
  autolayer(googfc3, series = "Drift", PI = FALSE) +
  ggtitle("Google Daily Stock Price Index") +
  xlab("Day") + ylab("Closing Price(in $)") +
  guides(colour = guide_legend(title = "Forecast"))

The above graph shows that the drift method is giving the best results.

googtest = window(goog, start = 201, end = 240)
accuracy(googfc1, googtest)
##                         ME      RMSE       MAE        MPE     MAPE
## Training set -4.296286e-15  36.91961  26.86941 -0.6596884  5.95376
## Test set      1.132697e+02 114.21375 113.26971 20.3222979 20.32230
##                   MASE      ACF1 Theil's U
## Training set  7.182995 0.9668981        NA
## Test set     30.280376 0.8104340  13.92142
accuracy(googfc2, googtest)
##                      ME      RMSE       MAE       MPE      MAPE     MASE
## Training set  0.6967249  6.208148  3.740697 0.1426616 0.8437137 1.000000
## Test set     24.3677328 28.434837 24.593517 4.3171356 4.3599811 6.574582
##                     ACF1 Theil's U
## Training set -0.06038617        NA
## Test set      0.81043397  3.451903
accuracy(googfc3, googtest)
##                        ME      RMSE       MAE         MPE      MAPE
## Training set 4.653165e-13  6.168928  3.824406 -0.01570676 0.8630093
## Test set     1.008487e+01 14.077291 11.667241  1.77566103 2.0700918
##                  MASE        ACF1 Theil's U
## Training set 1.022378 -0.06038617        NA
## Test set     3.119002  0.64732736  1.709275

All the error measures shows that drift method gives the best forecast.

Time Series Cross Validation :- In this method we do a series of tests each test contains one test observations and the preceeding that observation is the training set. This way we are not using any future observation for forecast. Initial observations are not used for cross validation as the forecast on a small dataset is not reliable.

e = tsCV(goog200, rwf, drift = TRUE, h = 1)
sqrt(mean(e^2, na.rm = TRUE))
## [1] 6.233245

6.233245 is our average cross validation error.

sqrt(mean(residuals(rwf(goog200, drift = TRUE))^2, na.rm  = TRUE))
## [1] 6.168928

RMSE from the residuals is obviously smaller as we are using all the available observations including future observations to forecast.

We can also do a multistep forecast cross validation.

e = tsCV(goog200, forecastfunction = naive, h = 8)

mse = colMeans(e^2, na.rm = TRUE)

data.frame(h = 1:8, MSE = mse) %>% 
  ggplot(aes(x = h, y = MSE)) + geom_point()

The above shows how the forecast error is increasing as the forecast horizon is increasing.

Excercise :-

5). Australian Beer Data The data is seasonal in nature. So, we are using seasonal naive forecast.

beer <- window(ausbeer, start=1992)
fc <- snaive(beer)
autoplot(fc)

res <- residuals(fc)
autoplot(res)

checkresiduals(fc)

## 
##  Ljung-Box test
## 
## data:  Residuals from Seasonal naive method
## Q* = 32.269, df = 8, p-value = 8.336e-05
## 
## Model df: 0.   Total lags used: 8

The residuals looks negatively skewed. The last graph shows that they are not normally distributed.

p-value is very small concludes that residuals are autocorrelated.ACF plot shows a spike at lag 4.

Residuals are scattered around zero but some spikes can be seen on the negetive side. Overall it looks like the model does good but can be improved.

6). Internet Usage per minute.

autoplot(WWWusage) +
  ggtitle("Internet Usage per Minute") +
  xlab("Minutes") + ylab("No. of Users")

The data shows no seasonality. We are going to use naive method to fit the data.

fc = naive(WWWusage)
res = residuals(fc)
autoplot(res)

checkresiduals(fc)

## 
##  Ljung-Box test
## 
## data:  Residuals from Naive method
## Q* = 145.58, df = 10, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 10

Residual plot shows that residuals are not random in nature. There is a possible correlation between the residuals. ACF also indicates the same. Overall the naive is not fitting the data properly.

Quarterly clay brick production :-

autoplot(bricksq) +
  ggtitle("Quarterly clay brick production") +
  xlab("Quarter")

The data shows some seasonal patterns so we are going to use seasonal naive method.

fc = snaive(bricksq)
res = residuals(fc)
autoplot(res)

checkresiduals(fc)

## 
##  Ljung-Box test
## 
## data:  Residuals from Seasonal naive method
## Q* = 233.2, df = 8, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 8

Residual plot shows that residuals have increasing variance. Also we are highly overestimating in some cases that results in spikes in the residuals on the negetive side.

ACF plots also shows correlation among the residuals.

Seasonal naive method is not fitting the data properly.

Retail Data :-

retaildata = readxl::read_excel("/Users/atyagi/Desktop/Time Series Forecasting/Time-Series-Forecasting/Time_series_data/retail.xlsx",skip =1)
## readxl works best with a newer version of the tibble package.
## You currently have tibble v1.4.2.
## Falling back to column name repair from tibble <= v1.4.2.
## Message displays once per session.
myts = ts(retaildata[,"A3349873A"], start = c(1984,4), frequency = 12)

Splitting the datasets in train and test.

myts.train = window(myts, end = c(2010,12))
myts.test = window(myts, start = 2011)
autoplot(myts) +
  autolayer(myts.train, series = "Training") +
  autolayer(myts.test, series = "Test")

The dataset is seasonal in nature, so we will apply sesasonal naive method.

fc = snaive(myts.train)
accuracy(fc, myts.test)
##                     ME     RMSE      MAE       MPE     MAPE     MASE
## Training set  8.866667 19.60898 15.41100 5.2561804 8.116672 1.000000
## Test set     -0.237500 25.09626 18.47083 0.1008644 6.074517 1.198548
##                   ACF1 Theil's U
## Training set 0.7165701        NA
## Test set     0.4798145 0.7159931
checkresiduals(fc)

## 
##  Ljung-Box test
## 
## data:  Residuals from Seasonal naive method
## Q* = 567.7, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24

Residual plots shows that mean of the residuals is greater than zero. Also the variance of the residuals is increasinf with time. Residuals are appears to be normally distributed but with a higher peak value. Residuals are also correlated as can be seen in the ACF plot and Q-test.

The accuracy measures are also giving some contradicting results. RMSE and MASE had increased in the test set but on the other hand MAPE has decreased. This is happening because we have more residuals with positive values and as it has been mentioned earlier that percentage error put more penalty on the positive errors, it can give us a misleading results if the test data is not as identical as the train data.

9). Quarterly visitor nights for various regions of Australia:- Total quarterly visitor nights (in millions) from 1998-2016 for twenty regions of Australia within six states. We are going to pick just one region QLDMetro.

autoplot(visnights[,"QLDMetro"]) 

train1= window(visnights[,"QLDMetro"], end = c(2015,4))
train2= window(visnights[,"QLDMetro"], end = c(2014,4))
train3= window(visnights[,"QLDMetro"], end = c(2013,4))
fc1 = snaive(train1)
accuracy(fc1,window(visnights[,"QLDMetro"], start = 2016, end = c(2016,4)))
##                      ME      RMSE       MAE        MPE     MAPE      MASE
## Training set 0.02006107 1.0462821 0.8475553 -0.2237701 7.976760 1.0000000
## Test set     0.56983879 0.9358727 0.7094002  4.6191866 6.159821 0.8369957
##                    ACF1 Theil's U
## Training set 0.06014484        NA
## Test set     0.09003153 0.4842061
fc2 = snaive(train2)
accuracy(fc2,window(visnights[,"QLDMetro"], start = 2015, end = c(2015,4)))
##                      ME      RMSE       MAE        MPE     MAPE      MASE
## Training set 0.01618760 1.0735582 0.8809432 -0.2744747 8.284216 1.0000000
## Test set     0.08203656 0.4117902 0.3133488  0.5875037 3.057463 0.3556969
##                     ACF1 Theil's U
## Training set  0.06610879        NA
## Test set     -0.59903247 0.3336559
fc3 = snaive(train3)
accuracy(fc3,window(visnights[,"QLDMetro"], start = 2014, end = c(2014,4)))
##                        ME     RMSE       MAE        MPE     MAPE      MASE
## Training set -0.007455407 1.074544 0.8821694 -0.5625865 8.271365 1.0000000
## Test set      0.370832661 1.058658 0.8625501  4.0472032 8.476977 0.9777602
##                     ACF1 Theil's U
## Training set  0.07317746        NA
## Test set     -0.36890062  1.177346

IBM Stock Prices :-

autoplot(ibmclose) +
  ggtitle("IBM Stock Prices") +
  ylab("Stock Price")

The data has a downward trend and also a cyclic pattern

train_ibm = window(ibmclose, end = 300)
test_ibm = window(ibmclose, start = 301)
fc1 = meanf(train_ibm)
fc2 = naive(train_ibm)
fc3 = rwf(train_ibm, drift = TRUE)
accuracy(fc1, test_ibm)
##                         ME      RMSE       MAE        MPE     MAPE
## Training set  1.660438e-14  73.61532  58.72231  -2.642058 13.03019
## Test set     -1.220933e+02 122.18978 122.09333 -32.083817 32.08382
##                  MASE       ACF1 Theil's U
## Training set 11.52098 0.98957790        NA
## Test set     23.95401 0.05670628  18.78547
accuracy(fc2, test_ibm)
##                      ME     RMSE     MAE         MPE     MAPE     MASE
## Training set -0.2809365 7.302815 5.09699 -0.08262872 1.115844 1.000000
## Test set      4.8000000 6.826419 5.40000  1.24443538 1.405293 1.059449
##                    ACF1 Theil's U
## Training set 0.13510517        NA
## Test set     0.05670628 0.9444433
accuracy(fc3, test_ibm)
##                         ME     RMSE      MAE         MPE     MAPE     MASE
## Training set -3.916293e-14 7.297409 5.127996 -0.02530123 1.121650 1.006083
## Test set      6.345151e+00 7.708126 6.551839  1.65200211 1.707415 1.285433
##                     ACF1 Theil's U
## Training set  0.13510517        NA
## Test set     -0.08172572  1.100355

Naive method gives the best test accuracy among all the methods. Drift is performing a little worse because in test set the time series start increasing again which is probably because of a cyclic pattern that can be seen the time plot above.

checkresiduals(fc2)

## 
##  Ljung-Box test
## 
## data:  Residuals from Naive method
## Q* = 22.555, df = 10, p-value = 0.01251
## 
## Model df: 0.   Total lags used: 10

Residuals looks fine at the start but start increasing in size as the time increase. Ljung test also proves that the residuals are correlated to each other which can also be seen in ACF plot.

Housing Sales Data:-

autoplot(hsales) +
  ggtitle("Sales of one-family houses") 

The data looks seasonal with a cyclic pattern also present.

train_hsales = window(hsales, end = c(1993,12))
test_hsales = window(hsales, start = 1994)
fc1 = meanf(train_hsales)
fc2 = naive(train_hsales)
fc3 = snaive(train_hsales)
accuracy(fc1, test_hsales)
##                        ME     RMSE      MAE       MPE     MAPE     MASE
## Training set 2.480763e-15 12.13880 9.498898 -6.120182 20.30851 1.119163
## Test set     6.451587e+00 10.00315 7.841270  9.521209 12.60939 0.923861
##                   ACF1 Theil's U
## Training set 0.8661515        NA
## Test set     0.2475017  1.074061
accuracy(fc2, test_hsales)
##                       ME      RMSE      MAE        MPE      MAPE      MASE
## Training set -0.01593625  6.289813 4.988048 -0.7800232  9.880157 0.5876934
## Test set      7.40000000 10.639549 8.600000 11.1730634 13.839730 1.0132548
##                   ACF1 Theil's U
## Training set 0.1829708        NA
## Test set     0.2475017   1.15566
accuracy(fc3, test_hsales)
##                     ME      RMSE    MAE        MPE     MAPE      MASE
## Training set 0.1375000 10.576113 8.4875 -2.1016380 17.63375 1.0000000
## Test set     0.3043478  6.160886 5.0000 -0.7312374  9.12828 0.5891016
##                  ACF1 Theil's U
## Training set 0.838108        NA
## Test set     0.224307 0.8031005

As expected, seasonal naive method gives the best results as it takes care of the seasonality present in the data.

checkresiduals(fc3)

## 
##  Ljung-Box test
## 
## data:  Residuals from Seasonal naive method
## Q* = 684.66, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24

We can see that the residuals showing a cyclic pattern that is because the cyclic pattern still exist in the time series. Residuals are bimodel and negatively skewed. Residuals are also correlated as can be seen the ACF plot.